Paraphrase Extraction from Validated Question Answering Corpora in Spanish

نویسندگان

  • Jesús Herrera
  • Anselmo Peñas
  • M. Felisa Verdejo
چکیده

Basing on the debate around the definition of paraphrase, this work aims to empirically clarify what is considered a paraphrase by humans. The experiment accomplished has its starting point in one of the several campaigns that every year generate large amounts of validated textual data, which can be reused for different purposes. This paper describes in detail a simple method –based on pattern–matching and deletion and insertion operations–, able to extract a remarkable amount of paraphrases from Question Answering assessed corpora. An assessment of the corpus obtained was accomplished by experts, and an analysis of this process is shown. This work has been developed for Spanish.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارایه یک پیکره‌ پرسش و پاسخ مذهبی در زبان فارسی

Question answering system is a field in natural language processing and information retrieval noticed by researchers in these decades. Due to a growing interest in this field of research, the need to have appropriate data sources is perceived. Most researches about developing question answering corpus area have been done in English so far, but in other languages as Persian, the lack of these co...

متن کامل

Automatic Acquisition of Context-Specific Lexical Paraphrases

Lexical paraphrasing aims at acquiring word-level paraphrases. It is critical for many Natural Language Processing (NLP) applications, such as Question Answering (QA), Information Extraction (IE), and Machine Translation (MT). Since the meaning and usage of a word can vary in distinct contexts, different paraphrases should be acquired according to the contexts. However, most of the existing res...

متن کامل

Interlingual annotation of parallel text corpora: a new framework for annotation and evaluation

This paper focuses on an important step in the creation of a system of meaning representation and the development of semantically-annotated parallel corpora, for use in applications such as machine translation, question answering, text summarization, and information retrieval. The work described below constitutes the first effort of any kind to annotate multiple translations of foreign-language...

متن کامل

Squibs: On Paraphrase and Coreference

Paraphrase extraction1 and coreference resolution have applications in Question Answering, Information Extraction, Machine Translation, and so forth. Paraphrase pairs might be coreferential, and coreference relations are sometimes paraphrases. The two overlap considerably (Hirst 1981), but their definitionsmake them significantly different in essence: Paraphrasing concerns meaning, whereas core...

متن کامل

Using Multiple Metrics in Automatically Building Turkish Paraphrase Corpus

Paraphrasing is expressing similar meanings with different words in different order. In this sense it is viewed as translation in the same language. It is an important issue in natural language processing for automatic machine translation, question answering, text summarization and language generation. Studies in paraphrasing can be classified as paraphrase extraction, paraphrase generation, pa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Procesamiento del Lenguaje Natural

دوره 39  شماره 

صفحات  -

تاریخ انتشار 2007